The Motivation behind Checkbnb

Under newfound regulatory guidelines in Amsterdam, the Airbnb market is increasingly challenging to navigate. As of July 2020, the City of Amsterdam outright banned Airbnbs in three of Amsterdam’s city center neighborhoods, added limits to the number of nights that a host could rent out their property, limited the number of guests that a property can host, and enacted a permitting fee. Checkbnb, an affilate of Airbnb, allows prospective hosts to calculate the potential revenue of renting their property given the property’s home specifications, listing details, and access to various amenities and disamenities in the surrounding area. The provided model leverages existing Amsterdam Airbnb data from 2018 to inform and construct a model that generalizes to today’s rental market. The model output, potential revenue, gives users an accurate understanding of the possible financial gains that can result from renting out their home on the Airbnb platform. Given the challenging regulatory landscape, Checkbnb seeks to simplify the decision-making process for perspective hosts through an easy-to-use platform and accurate model.

Model Data

The data used in our final Checkbnb model was sourced from Airbnb-provided data on 2018 listings in Amsterdam, open data provided by the City of Amsterdam, and Open Street Map (OSM) data that crowd-sources locations of various landmarks and amenities across the city. The Airbnb data consists of information on listing price, number of bedrooms and bathrooms, as well as other data on reviews and available amenities to home guests. The city’s data includes information on locations of city-provided services and neighborhood features, as well as information on land use types. Lastly, the OSM data provides other locational data on city amenities and features.

#Setting the bounding box to pull in OSM data
xmin = st_bbox(districts)[[1]]
ymin = st_bbox(districts)[[2]]
xmax = st_bbox(districts)[[3]]  
ymax = st_bbox(districts)[[4]]

# Bars
bar <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'amenity', value = c("bar", "biergarten", "pub")) %>%
  osmdata_sf()
bar <-
  bar$osm_points %>%
  .[districts,]

# Restaurants
restaurant <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'amenity', value = c("restaurant", "cafe")) %>%
  osmdata_sf()
restaurant <-
  restaurant$osm_points %>%
  .[districts,]

university <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'amenity', value = c("university", "college")) %>%
  osmdata_sf()
university <-
  university$osm_points %>%
  .[districts,]

## Schools
schools <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'amenity', value = c("school")) %>%
  osmdata_sf()
schools <-
  schools$osm_points %>%
  .[districts,]

## land use - retail
retail <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'landuse', value = c("retail")) %>%
  osmdata_sf()
retail <-
  retail$osm_points %>%
  .[districts,]

## stadiums/sports centers
stadiums <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'leisure', value = c("stadium")) %>%
  osmdata_sf()
stadiums <-
  stadiums$osm_points %>%
  .[districts,]

## Parks
parks_osm <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'leisure', value = c("park")) %>%
  osmdata_sf()
parks_osm <-
  parks_osm$osm_points %>%
  .[districts,]

## industrial buildings
industrial <- opq(bbox = c(xmin, ymin, xmax, ymax)) %>%
  add_osm_feature(key = 'building', value = c("industrial")) %>%
  osmdata_sf()
industrial <-
  industrial$osm_points %>%
  .[districts,]

Exploratory Analysis

Before engineering any additional features for our model, we reviewed data available from the Airbnb dataset. Figure 1 below maps the locations and prices of Amsterdam Airbnbs in 2018. With regard to the spatial process of rental price and number of Airbnbs, there’s a higher concentration of Airbnb rentals in the more central part of the city, with fewer units extending outward. Higher prices are also more concentrated in the city center.

Figures 2 and 3 show some of the categorical and numerical features included in the Airbnb dataset and their relationship to price. We can see that, on average, price is higher when units have real beds and offer the entire home to the renter as opposed to a single bedroom. Some neighborhoods, including Buiksloterham located on the northern part of the canal, have higher prices on average and property types like serviced apartments and villas also attract higher priced rentals. The numerical features indicate a positive relationship between price and the number of beds, baths, the number of people the home can accommodate, and how often the home is available during the year.

#######################
# Neighborhood Plot
######################
listings_details.sf  <- listings_details %>% 
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326, agr = "constant") %>%
  st_transform(st_crs(neighborhoods_new.sf))

listings_details.sf$price2 = as.numeric(gsub("\\$", "", listings_details.sf$price))

ggplot() +
  geom_sf(data = neighborhoods_new.sf, fill = "#2f4550") + 
  geom_sf(data = listings_details.sf, aes(colour = q5(price2)), 
          show.legend = "point", size = .1) +
  scale_colour_manual(values = paletteorngs,
                      labels = qBr(listings_details.sf, "price2"),
                      name = "Nightly Airbnb Price\n(Quintile Breaks)") +
  labs(title="Nightly Airbnb Price, Amsterdam",
       subtitle = "Figure 1") +
  mapTheme()

#taking out outlier property types
listings_details = filter(listings_details, 
                               property_type != "Lighthouse" & 
                               property_type != "Earth House" & 
                               property_type != "Nature lodge" & 
                               property_type != "Castle" &
                               property_type != "Tent" &
                               property_type != "Campsite")

listings_details %>% 
  dplyr::select(price2, property_type, room_type, neighbourhood, bed_type) %>%
  gather(Variable, Value, -price2) %>% 
   ggplot(aes(Value, price2)) +
     geom_bar(position = "dodge", stat = "summary", fun.y = "mean", fill ="#7fc3dc", col="#1a81a2", alpha = 0.9 ) +
     facet_wrap(~Variable, ncol = 1, scales = "free") +
  labs(title = "Price as a function of categorical variables", y = "Mean Price", subtitle = "Figure 2") +
     plotTheme() + theme(axis.text.x = element_text(angle = 45, size=20, hjust = 1),
                         axis.text.y = element_text(size = 20),
                         plot.title = element_text(size = 30),
                         plot.subtitle = element_text(size = 20),
                         axis.title.x = element_text(size = 20),
                         axis.title.y = element_text(size = 20))

st_drop_geometry(listings_details.sf) %>% 
  dplyr::select(price2, accommodates, bedrooms, bathrooms, availability_365) %>%
  filter(price2 <= 1000000) %>%
  gather(Variable, Value, -price2) %>% 
  ggplot(aes(Value, price2)) +
  geom_point(shape = 16, size = 3,color= "#cde1b1", alpha = 0.5) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
  facet_wrap(~Variable, ncol = 4, scales = "free") +
  labs(title = "Price as a function of numerical variables", subtitle = "Figure 3") +
  plotTheme() + theme(plot.title = element_text(size = 30),
                      plot.subtitle = element_text(size = 20),
                      axis.title.x = element_text(size = 20),
                      axis.title.y = element_text(size = 20))

Feature Engineering

Apart from some of the features in the original Airbnb dataset, we engineered a series of variables for our final model with the goal of minimizing error in rental price. We wanted to ensure our model was both accurate and generalizable in predicting a range of property types and locations.

We engineered variables based on descriptions that Airbnb hosts have posted about their home and analyzed them to see if there were keywords associated with higher or lower prices. We leveraged the nearest neighbor method to look at access to various services and amenities in the city like swimming areas and markets. We calculated lagPrice for each listing, which is the average price of the three closest homes to each Airbnb, and local Moran’s I, which provides distance to the closest highly significant cluser of high priced homes.

Description Analysis

We first analyzed the descriptions associated with each listing to determine key words associated with higher or lower prices. We in turn created dummy variables associated with different price points.

listings_details.sf$luxury <- ifelse(grepl("luxur",  ignore.case=TRUE, listings_details.sf$name), "yes", "no") 

listings_details.sf$canal <- ifelse(grepl("canal", ignore.case=TRUE, listings_details.sf$name), "yes", "no") 

listings_details.sf$expamen <- ifelse(grepl("view|terrace|spac|rooftop|loft|roof", ignore.case=TRUE, listings_details.sf$name), "yes", "no") 

listings_details.sf$expcodes <- ifelse(grepl("family|big|light|heart|large|design|pijp|jordaan", ignore.case=TRUE, listings_details.sf$name), "yes", "no") 

listings_details.sf$citycenterdesc <- ifelse(grepl("center|centre",ignore.case=TRUE, listings_details.sf$name), "yes", "no") 

listings_details.sf$cheapcodes <- ifelse(grepl("cozy|cosy|free|room|little|bed|garden|vondelpark", ignore.case=TRUE, listings_details.sf$name), "yes", "no") 

listings_details.sf$pool <- ifelse(grepl("Pool", ignore.case=TRUE, listings_details.sf$amenities), "yes", "no")

listings_details.sf$expamencat <- ifelse(grepl("friendly|detector|workspace|hot|water|parking|private|first|aid|greets|luggage|wide|kit|linens|dropoff|balcony|step-free|books|Wide|dryer|laptop|access|premises", ignore.case=TRUE, listings_details.sf$amenities), "yes", "no")


listings_details.sf$expsummary <- ifelse(grepl("located|minutes|walk|bars|view|center|away|museum|tram", ignore.case=TRUE, listings_details.sf$summary), "yes", "no")


listings_details.sf$expdescrip <- ifelse(grepl("kitchen|located|floor|garden|walk|large|double|open|centre|beautiful|center|terrace|fully|canal|big|shops|two|close|view|also|famous|bright|", ignore.case=TRUE, listings_details.sf$description), "yes", "no")


listings_details.sf$expneighbdesc <- ifelse(grepl("walk|city| shops|bars|nice|just|museum|distance|close|min|Jordaan|best|local|Anne|popular|trendy|Gogh|Pijp|quiet|meters|right|one|park|Amstel|Cuyp", ignore.case=TRUE, listings_details.sf$summary), "yes", "no")

listings_details.sf$expneighwoclou<- ifelse(grepl("Pijp|Plantage|Westelijke|Zeeburg|Zeeheldenbuurt|Lastage|Weesperbuurt|Oud-Zuid|", ignore.case=TRUE, listings_details.sf$summary), "yes", "no")

Access to amenities and services

For a series of variables from OSM and Amsterdam’s open data portal, we calculated the nearest neighbor distance from each listing to the below (dis)amenities and services around the city.

st_c <- st_coordinates

## Metro Stops
metro_stops.sf <- metro_stops%>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
metro_stops.sf <- st_join(metro_stops.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    metrostops_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(metro_stops.sf)), 2))

## Swimming Areas
swim.sf <- swim%>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
swim.sf <- st_join(swim.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    swim_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(swim.sf)), 1))

## Wall Art
wall_art.sf <- wall_art%>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
metro_stops.sf <- st_join(wall_art.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    wallart_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(wall_art.sf)), 1))

## Markets
markets.sf <- markets%>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
markets.sf <- st_join(markets.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    markets_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(markets.sf)), 1))

## Playgrounds
playgrounds.sf <- playgrounds %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
playgrounds.sf <- st_join(playgrounds.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    playgrounds_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(playgrounds.sf)), 2))

## Student Housing
students.sf <- student_housing %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
students.sf <- st_join(students.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    students_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(students.sf)), 2))

## Historic buildings
hist_build.sf <- hist_build %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
hist_build.sf <- st_join(hist_build.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    histbuild_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(hist_build.sf)), 3))

## Monumuments
monuments.sf <- monuments %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
monuments.sf <- st_join(monuments.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    monuments_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(monuments.sf)), 3))

## Bars
bar.sf <- bar %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
bar.sf <- st_join(bar.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    bar_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(bar.sf)), 2))

## Restaurants
restaurant.sf <- restaurant %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
restaurant.sf <- st_join(restaurant.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    restaurant_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(restaurant.sf)), 3))

## University
university.sf <- university %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
university.sf <- st_join(university.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    university_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(university.sf)), 1))

## Schools
schools.sf <- schools %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
schools.sf <- st_join(schools.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    schools_nn2 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(schools.sf)), 2))

## Retail
retail.sf <- retail %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
retail.sf <- st_join(retail.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    retail_nn3 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(retail.sf)), 3))

## Industrial
industrial.sf <- industrial %>%
  st_transform(st_crs(districts.sf)) %>%
  st_as_sf()
industrial.sf <- st_join(industrial.sf, districts.sf, join = st_intersects, left = FALSE)

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    industrial_nn1 = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(industrial.sf)), 1))

Average price of three closest rentals

We eliminated outlier property types and calculated the average price of the three closest rental properties for each listing.

listings_details.sf <- subset(listings_details.sf, price2 > 0)

listings_details.sf = filter(listings_details.sf, 
                               property_type != "Lighthouse" & 
                               property_type != "Earth house" & 
                               property_type != "Nature lodge" & 
                               property_type != "Castle" &
                               property_type != "Tent" &
                               property_type != "Campsite" &
                               property_type != "Barn")


k_nearest_neighbors = 3

#prices
coords <- st_coordinates(listings_details.sf) 

# k nearest neighbors
neighborList <- knn2nb(knearneigh(coords, k_nearest_neighbors))
spatialWeights <- nb2listw(neighborList, style="W")
listings_details.sf$lagPrice <- lag.listw(spatialWeights, listings_details.sf$price2)

Local Moran’s I

Figure 4 shows more localized clustering of higher priced homes along to the canal as well as just north of the canal. Figure 5 identifies the significant hotspots of high-priced rentals, which we then transformed into polygons and calculated distance from each rental property to its closest significant fishnet cell.

# Create fishnet
fishnet <- 
  st_make_grid(neighborhoods_new.sf, cellsize = 200) %>%
  st_sf() %>%
  mutate(uniqueID = rownames(.))

# Create fishnet with the average price of the rental properties located in each grid cell
price_net <- 
  dplyr::select(listings_details.sf) %>% 
  mutate(price = listings_details.sf$price2) %>% 
  aggregate(., fishnet, mean) %>%
  mutate(price = replace_na(price, 0),
         uniqueID = rownames(.),
         cvID = sample(round(nrow(fishnet) / 24), size=nrow(fishnet), replace = TRUE))

price_net <- subset(price_net, price > 0)

ggplot() +
  geom_sf(data=fishnet, fill = "grey40") +
  geom_sf(data = price_net, aes(fill = price)) +
  scale_fill_viridis() +
  labs(title = "Average Rental Price", subtitle = "Figure 4") +
  mapTheme()

# Local Moran's I
## Create neighbor list and spatial weights matrix
final_net.nb <- poly2nb(as_Spatial(price_net), queen=TRUE)
final_net.weights <- nb2listw(final_net.nb, style="W", zero.policy=TRUE)

# Combining the price_net with localmoran test
final_net.localMorans <- 
  cbind(
    as.data.frame(localmoran(price_net$price, final_net.weights)),
    as.data.frame(price_net)) %>% 
    st_sf() %>%
      dplyr::select(Avg_Price = price, 
                    Local_Morans_I = Ii, 
                    P_Value = `Pr(z > 0)`) %>%
      mutate(Significant_Hotspots = ifelse(P_Value <= 0.001, 1, 0)) %>%
      gather(Variable, Value, -geometry)
  
vars <- unique(final_net.localMorans$Variable)
varList <- list()

for(i in vars){
  varList[[i]] <- 
    ggplot() +
      geom_sf(data = fishnet, fill = "grey40") +
      geom_sf(data = filter(final_net.localMorans, Variable == i), 
              aes(fill = Value), colour=NA) +
      scale_fill_viridis(name="") +
      labs(title=i, subtitle = "Figure 5") +
      mapTheme() + theme(legend.position="bottom")}

do.call(grid.arrange,c(varList, ncol = 4, top = "Local Morans I statistics, Price"))

sig_net <- 
  dplyr::filter(final_net.localMorans, final_net.localMorans $ Variable == "Significant_Hotspots") 

sig_net <-
  dplyr::filter(sig_net, sig_net $ Value == 1)

sig_net.sf <- sig_net %>%
  st_as_sf(coords = "geometry", crs = 4326, agr = "constant") %>%
  st_transform('EPSG:28992')

listings_details.sf <-
  listings_details.sf %>%
  mutate(
    sig_cell = nn_function(st_c(st_centroid(listings_details.sf)), st_c(st_centroid(sig_net.sf)), 1))

Transforming Variables

We manipulated three of our numerical variables and one categorical variables and transformed them to improve their predictive power in the model.

listings_details.sf <- 
  listings_details.sf %>% 
  mutate(size_of_group = 
          case_when(guests_included <= 4  ~ "Large Group",
                    guests_included <= 2 & (guests_included) > 4  ~ "Small Group",
                    guests_included >= 1 ~ "Solo"))

listings_details.sf <- 
  listings_details.sf %>% 
  mutate(expneighbors = 
          case_when(lagPrice <= 90  ~ "Cheap Neighbors",
                   lagPrice <= 250 & (lagPrice) > 90  ~ "Average Neighbors",
                    lagPrice >= 251 ~ "Expensive Neighbors"))

listings_details.sf <- 
  listings_details.sf %>% 
  mutate(hotspotlevel = 
          case_when(sig_cell <= 368  ~ "Not Hot Spot",
                   sig_cell <= 1500 & (sig_cell) > 369  ~ "Semi-Hot",
                    sig_cell >= 1501 ~ "Hot Spot"))

entirehomelist <- list("Entire home/apt"
                        )
listings_details.sf <- listings_details.sf %>%
  mutate(Entire_Home = room_type %in% entirehomelist)
listings_details.sf$Entire_Home <- ifelse(listings_details.sf$Entire_Home == "TRUE", 1, 0)

Notable Variables

The correlation matrix below in Figure 6 shows the included numerical variables in our final model. These included a combination of the variables in the Airbnb dataset as well as the engineered features. Figure 7 looks at three of the significant engineered features, including average distance to the three nearest historical buildings, average price of three nearest rentals, and the existance of coded language affiliated with cheaper properties.

Vars <- listings_details.sf %>%
  dplyr::select(price2,
                accommodates, bathrooms, bedrooms, availability_365, number_of_reviews, review_scores_rating, markets_nn1, metrostops_nn2, playgrounds_nn2, histbuild_nn3, monuments_nn3, restaurant_nn3, university_nn1, retail_nn3, industrial_nn1, lagPrice, sig_cell, minimum_nights, students_nn2, wallart_nn3, schools_nn2, retail_nn3, industrial_nn1, lagPrice, sig_cell, beds, review_scores_accuracy, review_scores_cleanliness, review_scores_rating, review_scores_communication, review_scores_location, review_scores_value, reviews_per_month, guests_included) %>%
  na.omit()


Vars <- st_drop_geometry(Vars)

ggcorrplot(
  round(cor(Vars), 1), 
  p.mat = cor_pmat(Vars),
  colors = c("#037499","light grey","#f6b492"), 
  type="lower",
  insig = "blank") +  
  labs(title = "Correlation across numeric variables", subtitle = "Figure 6") +
  plotTheme() + theme(axis.text.x = element_text(angle = 45, size=25, hjust = 1),
                         axis.text.y = element_text(size = 25),
                         plot.title = element_text(size = 30),
                         plot.subtitle = element_text(size = 20))

Hist_build_plot <- st_drop_geometry(listings_details.sf) %>% 
  dplyr::select(price2, histbuild_nn3) %>%
  filter(price2 <= 1000000) %>%
  gather(Variable, Value, -price2) %>% 
  ggplot(aes(Value, price2)) +
  geom_point(shape = 16, size = 1, color= "#72b7cd", alpha = 0.7) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
  facet_wrap(~Variable, ncol = 3, scales = "free") +
  plotTheme()
             
lagPrice_plot <- st_drop_geometry(listings_details.sf) %>% 
  dplyr::select(price2, lagPrice) %>%
  filter(price2 <= 1000000) %>%
  gather(Variable, Value, -price2) %>% 
  ggplot(aes(Value, price2)) +
  geom_point(shape = 16, size = 1,color= "#cde1b1", alpha = 0.5) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
  facet_wrap(~Variable, ncol = 4, scales = "free") +
  plotTheme()

cheapcodes_plot <- st_drop_geometry(listings_details.sf) %>% 
  dplyr::select(price2, cheapcodes) %>%
  filter(price2 <= 1000000) %>%
  gather(Variable, Value, -price2) %>% 
  ggplot(aes(Value, price2)) +
  geom_point(shape = 16, size = 1,color= "#f6c192", alpha = 0.5) + geom_smooth(method = "lm", se=F, colour = "#E86E23") +
  facet_wrap(~Variable, ncol = 3, scales = "free") +
  plotTheme()

grid.arrange(Hist_build_plot, lagPrice_plot, cheapcodes_plot, ncol=3, top = "Price as a function of average distance to three nearest historical buildings, average price of three nearest rentals, and the existence of coded language affiliated with cheaper properties", bottom = "Figure 7")

The Checkbnb Model

After engineering the features, our final model and list of variables can be seen below. We used a combination of categorical and numerical variables to build and hone our model in order to minimize error and ensure it generalizes to a wide set of rental property types and locations within the city. The R-squared value of 0.5 indicates that the model accounts for approximately 50% of variation in home price.
Regression Results Summary
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.9234565 37.7657583 1.3748819 0.1691877
property_typeApartment -30.1482621 34.4195089 -0.8759062 0.3810944
property_typeBed and breakfast -28.8754265 34.5698494 -0.8352778 0.4035741
property_typeBoat -19.3271374 34.5802629 -0.5589066 0.5762334
property_typeBoutique hotel -12.1011289 36.8283910 -0.3285815 0.7424764
property_typeBungalow -16.1572398 40.5445463 -0.3985059 0.6902628
property_typeCabin -5.5548055 39.2372998 -0.1415695 0.8874219
property_typeCasa particular (Cuba) -55.3502898 48.5888842 -1.1391554 0.2546558
property_typeChalet -28.8919435 54.3463682 -0.5316260 0.5949926
property_typeCondominium -22.1900339 34.6055817 -0.6412270 0.5213846
property_typeCottage -21.3745878 39.8197358 -0.5367838 0.5914246
property_typeGuest suite -25.9389283 34.7931058 -0.7455192 0.4559691
property_typeGuesthouse 1.4852115 36.1777471 0.0410532 0.9672540
property_typeHostel -78.0002166 50.9585932 -1.5306587 0.1258739
property_typeHotel 8.8917324 42.1051789 0.2111791 0.8327503
property_typeHouse -24.4458320 34.4783705 -0.7090194 0.4783230
property_typeHouseboat 6.0039978 34.6755871 0.1731477 0.8625376
property_typeLoft 3.9072278 34.5702994 0.1130227 0.9100140
property_typeOther -31.4334659 35.8915960 -0.8757890 0.3811582
property_typeServiced apartment 21.2355818 35.4832759 0.5984673 0.5495368
property_typeTiny house -32.0456913 45.6052324 -0.7026758 0.4822683
property_typeTownhouse -10.6583706 34.5273543 -0.3086935 0.7575588
property_typeVilla -4.2348765 36.5521097 -0.1158586 0.9077661
accommodates 18.6472397 0.6950072 26.8302827 0.0000000
bathrooms 3.2206258 0.5549017 5.8039569 0.0000000
bedrooms 22.0929726 0.9417007 23.4607167 0.0000000
neighbourhoodBanne Buiksloot -12.5030324 11.4302664 -1.0938531 0.2740362
neighbourhoodBos en Lommer -15.6580507 5.4818534 -2.8563425 0.0042912
neighbourhoodBuiksloterham -30.5488617 10.0650285 -3.0351491 0.0024081
neighbourhoodBuikslotermeer -21.8410583 10.2150054 -2.1381348 0.0325212
neighbourhoodBuitenveldert-Oost -34.7145803 10.9203703 -3.1788831 0.0014813
neighbourhoodBuitenveldert-West -30.6230575 8.7555958 -3.4975413 0.0004709
neighbourhoodDe Pijp -5.3031527 4.5599270 -1.1629907 0.2448510
neighbourhoodDe Wallen 22.2293171 5.2476865 4.2360223 0.0000229
neighbourhoodFrederik Hendrikbuurt -7.8029454 5.6104282 -1.3907932 0.1643079
neighbourhoodGrachtengordel 1.7767408 4.1723203 0.4258400 0.6702303
neighbourhoodHoofddorppleinbuurt -23.0442186 5.9651410 -3.8631473 0.0001124
neighbourhoodIJplein en Vogelbuurt -28.3913424 7.2077893 -3.9389806 0.0000822
neighbourhoodIndische Buurt -16.4528140 5.0670328 -3.2470313 0.0011686
neighbourhoodJordaan 0.3452296 4.6165440 0.0747810 0.9403899
neighbourhoodKadoelen -16.1939612 17.3256121 -0.9346834 0.3499659
neighbourhoodLandelijk Noord -40.9608341 14.0701949 -2.9111774 0.0036058
neighbourhoodMuseumkwartier 4.5846795 5.7742546 0.7939864 0.4272154
neighbourhoodNieuwendam-Noord -43.8522503 12.5471704 -3.4949912 0.0004754
neighbourhoodNieuwendammerdijk en Buiksloterdijk -42.0188355 13.3816237 -3.1400401 0.0016924
neighbourhoodNieuwendammerham -3.0199954 23.7109842 -0.1273669 0.8986516
neighbourhoodNieuwmarkt en Lastage -0.8774195 5.4182284 -0.1619385 0.8713564
neighbourhoodOost -3.5501865 6.7045959 -0.5295154 0.5964555
neighbourhoodOostelijke Eilanden en Kadijken -20.2432887 5.9611171 -3.3958884 0.0006858
neighbourhoodOosterparkbuurt -14.2628639 4.9976308 -2.8539251 0.0043239
neighbourhoodOostzanerwerf -16.6012850 18.9222381 -0.8773426 0.3803140
neighbourhoodOsdorp -38.3159116 11.3125357 -3.3870312 0.0007083
neighbourhoodOud-West -13.3910917 4.3047785 -3.1107504 0.0018695
neighbourhoodOud-Zuid -10.3931262 6.0285815 -1.7239754 0.0847320
neighbourhoodOvertoomse Veld -20.7560277 7.6631425 -2.7085530 0.0067651
neighbourhoodRivierenbuurt -5.5851324 5.7191437 -0.9765679 0.3287982
neighbourhoodSlotermeer-Noordoost -37.5116651 10.3687880 -3.6177483 0.0002981
neighbourhoodSlotermeer-Zuidwest -28.5622169 10.1934081 -2.8020282 0.0050845
neighbourhoodSlotervaart -29.5165934 7.7865523 -3.7907141 0.0001508
neighbourhoodSpaarndammer en Zeeheldenbuurt -28.5130577 6.0192055 -4.7370135 0.0000022
neighbourhoodStadionbuurt -23.9118296 6.7280896 -3.5540296 0.0003805
neighbourhoodTuindorp Buiksloot -37.4024176 11.3811219 -3.2863559 0.0010172
neighbourhoodTuindorp Nieuwendam -53.9716424 12.3060500 -4.3857812 0.0000116
neighbourhoodTuindorp Oostzaan -39.5627108 12.6761103 -3.1210450 0.0018054
neighbourhoodVolewijck -36.2208164 8.5931260 -4.2150920 0.0000251
neighbourhoodWatergraafsmeer -16.3057194 6.1917760 -2.6334479 0.0084606
neighbourhoodWeesperbuurt en Plantage -8.2474445 5.7059065 -1.4454223 0.1483593
neighbourhoodWestelijke Eilanden -9.8739618 6.1419949 -1.6076148 0.1079397
neighbourhoodZeeburg -13.1612594 6.2880850 -2.0930473 0.0363610
room_typePrivate room -34.4060600 1.5243603 -22.5708192 0.0000000
room_typeShared room -52.0929819 8.6419605 -6.0279125 0.0000000
availability_365 0.1392824 0.0051852 26.8616629 0.0000000
number_of_reviews -0.1374698 0.0139222 -9.8741435 0.0000000
minimum_nights -0.1203122 0.0373444 -3.2216931 0.0012770
markets_nn1 -0.0023564 0.0021340 -1.1042259 0.2695121
metrostops_nn2 0.0018127 0.0042383 0.4276859 0.6688857
students_nn2 -0.0022773 0.0022395 -1.0169033 0.3092151
wallart_nn3 -0.0090397 0.0022491 -4.0191870 0.0000587
playgrounds_nn2 0.0056290 0.0044678 1.2599002 0.2077241
histbuild_nn3 -0.0010773 0.0022068 -0.4881697 0.6254365
monuments_nn3 -0.0079749 0.0029689 -2.6861221 0.0072363
restaurant_nn3 -0.0034376 0.0057245 -0.6005013 0.5481809
university_nn1 -0.0012755 0.0018495 -0.6896397 0.4904310
schools_nn2 0.0027790 0.0033050 0.8408243 0.4004592
retail_nn3 0.0013180 0.0019848 0.6640297 0.5066811
industrial_nn1 0.0031671 0.0035054 0.9034874 0.3662811
luxuryyes 32.7730587 2.3770997 13.7869935 0.0000000
lagPrice 0.0213229 0.0134636 1.5837371 0.1132737
canalyes 13.8005572 2.0078416 6.8733296 0.0000000
expamenyes 6.1278234 1.1678133 5.2472627 0.0000002
expcodesyes 0.0108277 1.2024254 0.0090048 0.9928154
citycenterdescyes -1.3403139 1.2679790 -1.0570473 0.2905063
cheapcodesyes -1.8720373 1.0649445 -1.7578731 0.0787886
poolyes 25.9880724 7.9468232 3.2702467 0.0010769
sig_cell -0.0140910 0.0024157 -5.8331489 0.0000000
beds -0.0480557 0.6936357 -0.0692809 0.9447669
review_scores_accuracy 1.0814071 1.0053952 1.0756040 0.2821210
review_scores_cleanliness 3.3462583 0.7592616 4.4072533 0.0000105
review_scores_rating 0.8113450 0.1323486 6.1303620 0.0000000
review_scores_communication -0.2744106 1.0412383 -0.2635426 0.7921359
review_scores_location 2.0546188 0.8457447 2.4293604 0.0151366
review_scores_value -6.3871075 0.8498631 -7.5154545 0.0000000
reviews_per_month -0.8801721 0.4928991 -1.7857044 0.0741665
expamencatyes -2.5325977 7.0642992 -0.3585066 0.7199691
expsummaryyes 2.1598107 1.2765303 1.6919384 0.0906776
expdescripyes -16.8050807 8.0255362 -2.0939511 0.0362804
expneighbdescyes -3.9033671 2.0130022 -1.9390774 0.0525098
expneighwoclouyes 8.1577218 3.8724376 2.1066116 0.0351671
guests_included 0.2434092 0.6580766 0.3698797 0.7114771
cancellation_policymoderate 0.4397150 1.3228215 0.3324069 0.7395865
cancellation_policystrict_14_with_grace_period 4.5737599 1.3349542 3.4261549 0.0006138
cancellation_policysuper_strict_60 73.7729522 12.9140336 5.7126189 0.0000000
size_of_groupSolo 55.2314752 5.8855930 9.3841818 0.0000000
expneighborsCheap Neighbors 2.7660931 1.9681803 1.4054064 0.1599203
expneighborsExpensive Neighbors 4.4682453 3.0002041 1.4893138 0.1364249
hotspotlevelNot Hot Spot 3.6609692 3.4940747 1.0477650 0.2947631
hotspotlevelSemi-Hot -3.0407486 2.4246416 -1.2541023 0.2098235
R2 0.5063
Adjusted R2 0.5026

Training and Testing our Model

We trained and tested our model on the selected variables to ensure both accuracy and generalizability. We divided our rental property data and features into separate training and test sets. We used the training set to test the generalizability of the model by accurately predicting home sale prices on a different set of data.

The summary output of the model provides both the significance values for each of the utilized variables to test prices as well as statistics that inform how accurate the model is in predicting price. The p-value, provided for each variable, indicates the confidence level that the variable is a good predictor of home price.
Regression Results Summary
Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.6960647 38.3116586 1.6886783 0.0913096
property_typeApartment -31.4201420 33.7569507 -0.9307755 0.3519903
property_typeBed and breakfast -27.9788486 33.9601906 -0.8238720 0.4100303
property_typeBoat -13.8143051 33.9872542 -0.4064555 0.6844159
property_typeBoutique hotel -13.4682739 36.3689677 -0.3703232 0.7111488
property_typeBungalow -26.9728296 43.0706535 -0.6262461 0.5311666
property_typeCabin -6.3321886 40.4134428 -0.1566852 0.8754958
property_typeCasa particular (Cuba) -94.7458609 53.1712187 -1.7819012 0.0747931
property_typeChalet -2.7967574 67.5484015 -0.0414038 0.9669748
property_typeCondominium -26.4536970 34.0213846 -0.7775609 0.4368448
property_typeCottage -22.9958648 39.0854019 -0.5883492 0.5563101
property_typeGuest suite -30.5828082 34.2931625 -0.8918048 0.3725172
property_typeGuesthouse -8.6336686 36.0440428 -0.2395311 0.8106983
property_typeHostel -114.5381795 60.6833263 -1.8874737 0.0591231
property_typeHotel -24.6035491 47.6100260 -0.5167724 0.6053255
property_typeHouse -25.1949104 33.8393420 -0.7445449 0.4565629
property_typeHouseboat 1.6712511 34.0935821 0.0490195 0.9609046
property_typeLoft 4.1207538 33.9837463 0.1212566 0.9034900
property_typeOther -32.5173016 36.0702847 -0.9014983 0.3673433
property_typeServiced apartment 37.3088152 35.4784286 1.0515915 0.2930102
property_typeTiny house -42.8674221 47.7616414 -0.8975282 0.3694569
property_typeTownhouse -14.2954186 33.9014517 -0.4216757 0.6732700
property_typeVilla -10.5454421 36.6773486 -0.2875192 0.7737202
accommodates 17.5730971 0.8246172 21.3106128 0.0000000
bathrooms 2.2138821 0.5536150 3.9989564 0.0000640
bedrooms 20.5644041 1.1019322 18.6621325 0.0000000
neighbourhoodBanne Buiksloot -20.5748334 13.2997946 -1.5470038 0.1218912
neighbourhoodBos en Lommer -18.8590901 6.4776727 -2.9113991 0.0036054
neighbourhoodBuiksloterham -34.7481228 12.1581348 -2.8580143 0.0042711
neighbourhoodBuikslotermeer -22.7318428 12.0442862 -1.8873549 0.0591391
neighbourhoodBuitenveldert-Oost -37.3753546 11.9136498 -3.1371876 0.0017103
neighbourhoodBuitenveldert-West -37.8207036 10.2647080 -3.6845377 0.0002302
neighbourhoodDe Pijp -8.0031599 5.3761751 -1.4886345 0.1366125
neighbourhoodDe Wallen 23.2543062 6.1897867 3.7568833 0.0001729
neighbourhoodFrederik Hendrikbuurt -10.4328591 6.6423240 -1.5706640 0.1162896
neighbourhoodGrachtengordel 0.2981652 4.9282740 0.0605009 0.9517578
neighbourhoodHoofddorppleinbuurt -27.0317242 7.0761446 -3.8201204 0.0001341
neighbourhoodIJplein en Vogelbuurt -33.3950238 8.4611773 -3.9468531 0.0000797
neighbourhoodIndische Buurt -16.4615391 6.0078864 -2.7399884 0.0061541
neighbourhoodJordaan -3.4215418 5.4632509 -0.6262831 0.5311423
neighbourhoodKadoelen -36.3846610 20.5315682 -1.7721326 0.0764004
neighbourhoodLandelijk Noord -43.4223657 16.0539720 -2.7047740 0.0068457
neighbourhoodMuseumkwartier -5.7958781 6.8075719 -0.8513870 0.3945731
neighbourhoodNieuwendam-Noord -50.1373883 15.0498510 -3.3314209 0.0008669
neighbourhoodNieuwendammerdijk en Buiksloterdijk -47.9557044 15.7394211 -3.0468531 0.0023180
neighbourhoodNieuwendammerham 3.6088717 29.7954380 0.1211216 0.9035969
neighbourhoodNieuwmarkt en Lastage -5.8685231 6.4487998 -0.9100179 0.3628331
neighbourhoodOost -5.2940857 7.8822651 -0.6716452 0.5018238
neighbourhoodOostelijke Eilanden en Kadijken -21.2208410 7.0690946 -3.0019178 0.0026889
neighbourhoodOosterparkbuurt -16.0510968 5.9009365 -2.7200931 0.0065367
neighbourhoodOostzanerwerf -29.1041629 21.7257004 -1.3396191 0.1803970
neighbourhoodOsdorp -42.7509903 13.3295890 -3.2072249 0.0013441
neighbourhoodOud-West -13.9312778 5.0995897 -2.7318429 0.0063082
neighbourhoodOud-Zuid -13.3792564 7.0195993 -1.9059858 0.0566782
neighbourhoodOvertoomse Veld -19.9844074 8.9498004 -2.2329445 0.0255728
neighbourhoodRivierenbuurt -8.8682756 6.7506646 -1.3136893 0.1889783
neighbourhoodSlotermeer-Noordoost -36.7652321 12.0586338 -3.0488721 0.0023025
neighbourhoodSlotermeer-Zuidwest -40.3413443 12.3385486 -3.2695372 0.0010806
neighbourhoodSlotervaart -28.0883357 9.2740588 -3.0286993 0.0024618
neighbourhoodSpaarndammer en Zeeheldenbuurt -31.5662524 7.0780863 -4.4597157 0.0000083
neighbourhoodStadionbuurt -23.1839545 7.8334137 -2.9596234 0.0030868
neighbourhoodTuindorp Buiksloot -45.5347110 13.2057569 -3.4480955 0.0005667
neighbourhoodTuindorp Nieuwendam -49.1521909 14.8833481 -3.3024955 0.0009614
neighbourhoodTuindorp Oostzaan -54.1052297 14.8948352 -3.6324826 0.0002820
neighbourhoodVolewijck -34.2436055 10.2162711 -3.3518693 0.0008054
neighbourhoodWatergraafsmeer -19.4997139 7.2088683 -2.7049619 0.0068418
neighbourhoodWeesperbuurt en Plantage -8.4493723 6.7581441 -1.2502504 0.2112348
neighbourhoodWestelijke Eilanden -16.1831981 7.3061649 -2.2150059 0.0267802
neighbourhoodZeeburg -14.3198893 7.3431966 -1.9500893 0.0511910
room_typePrivate room -33.2564064 1.7811563 -18.6712453 0.0000000
room_typeShared room -57.0871743 9.9161076 -5.7570144 0.0000000
availability_365 0.1382717 0.0060919 22.6977063 0.0000000
number_of_reviews -0.1280636 0.0164372 -7.7910949 0.0000000
minimum_nights -0.1568261 0.0495749 -3.1634158 0.0015636
markets_nn1 -0.0011827 0.0025011 -0.4728608 0.6363219
metrostops_nn2 0.0044905 0.0049318 0.9105080 0.3625747
students_nn2 -0.0017175 0.0026028 -0.6598707 0.5093507
wallart_nn3 -0.0089450 0.0026235 -3.4095056 0.0006532
playgrounds_nn2 0.0108027 0.0052286 2.0660540 0.0388469
histbuild_nn3 -0.0012133 0.0025760 -0.4709884 0.6376584
monuments_nn3 -0.0084732 0.0034659 -2.4447569 0.0145106
restaurant_nn3 -0.0045769 0.0066834 -0.6848188 0.4934728
university_nn1 -0.0006791 0.0021630 -0.3139621 0.7535558
schools_nn2 -0.0031132 0.0038449 -0.8097174 0.4181202
retail_nn3 0.0029055 0.0023326 1.2456419 0.2129226
industrial_nn1 0.0024174 0.0040976 0.5899392 0.5552436
luxuryyes 28.6881468 2.7570447 10.4053977 0.0000000
lagPrice 0.0265946 0.0158534 1.6775270 0.0934680
canalyes 11.3983534 2.3607928 4.8281888 0.0000014
expamenyes 5.3517563 1.3694240 3.9080345 0.0000936
expcodesyes -0.4872572 1.4109252 -0.3453459 0.7298410
citycenterdescyes -1.7537299 1.4891194 -1.1776959 0.2389435
cheapcodesyes -2.6772635 1.2502492 -2.1413840 0.0322651
poolyes 21.7950496 9.6114645 2.2676096 0.0233724
sig_cell -0.0147106 0.0028141 -5.2273638 0.0000002
beds 2.9734691 0.8456868 3.5160405 0.0004398
review_scores_accuracy 1.2845762 1.1772055 1.0912081 0.2752053
review_scores_cleanliness 2.8346929 0.8791821 3.2242387 0.0012668
review_scores_rating 0.8395258 0.1532516 5.4780870 0.0000000
review_scores_communication -0.3453102 1.2328056 -0.2801011 0.7794052
review_scores_location 1.8173183 0.9795064 1.8553409 0.0635744
review_scores_value -5.8769957 0.9932434 -5.9169740 0.0000000
reviews_per_month -1.3353345 0.5753636 -2.3208533 0.0203130
expamencatyes -9.4729312 8.2304316 -1.1509641 0.2497722
expsummaryyes 2.5047093 1.4977348 1.6723317 0.0944875
expdescripyes -21.1388224 9.3696186 -2.2561028 0.0240839
expneighbdescyes -2.5734512 2.3710858 -1.0853471 0.2777919
expneighwoclouyes 7.9299603 4.5496874 1.7429682 0.0813673
guests_included -0.3049352 0.7859769 -0.3879696 0.6980461
cancellation_policymoderate -0.2415770 1.5539666 -0.1554583 0.8764629
cancellation_policystrict_14_with_grace_period 4.9086469 1.5660670 3.1343786 0.0017267
cancellation_policysuper_strict_60 59.4161295 14.8311879 4.0061612 0.0000621
size_of_groupSolo 56.6031507 7.0124621 8.0717942 0.0000000
expneighborsCheap Neighbors 3.8931421 2.2988212 1.6935385 0.0903815
expneighborsExpensive Neighbors 6.5321470 3.5321645 1.8493326 0.0644368
hotspotlevelNot Hot Spot 3.2473808 4.0911219 0.7937629 0.4273507
hotspotlevelSemi-Hot -3.7868239 2.8549585 -1.3264024 0.1847341
R2 0.516
Adjusted R2 0.5109

Cross Validation

This section looks at both model accuracy and generalizability. As this analysis will indicate, our predictions are useful in showing how well our model does in accurately predicting rental price in our test set. Our k-fold cross validation specifically addresses how accurately our model predicts on new data and generalizes across holdout test sets. In turn, understanding our model’s accuracy and generalizability is critical to ensuring that Checkbnb users get an accurate predicted rental price no matter their home specifications or location.

Model Accuracy

In the process of honinig and engineering new variables for our model, we continually improved it so as to, in part, increase accuracy.As shown in the table below, we calculated how well our final model predicts on the testing set, which indicates a mean absolute error (MAE) of $38. This indicates that on average, we are either under or over predicting rental prices by $38. The mean absolute percent error (MAPE) of our model is approximately 26%.
MAE and MAPE for Test Set Data
Regression MAE MAPE
Baseline Regression 37.94078 0.2566976

The plots below show how well our model is predicting rental prices in both our training and test set data. The orange line indicates a perfect fit, meaning that our predicted rent prices match those of the actual values. As seen in the bottom right scatterplot, we are almost perfectly predicting rental prices in accordance with their actual values in our training data. Our model does slightly underpredict on higher priced rentals in our testing data set.

preds %>%
    mutate(price_Decile = ntile(actual, 10)) %>%
  group_by( price_Decile) %>%
    summarize(meanObserved = mean(actual, na.rm=T),
              meanPrediction = mean(pred, na.rm=T)) %>%
    gather(Variable, Value, -price_Decile) %>%          
    ggplot(aes(price_Decile, Value, shape = Variable)) +
      geom_point(size = 2) + geom_path(aes(group = price_Decile), colour = "black") +
      scale_shape_manual(values = c(2, 17)) +
      labs(title = "Predicted and observed Price by observed Price Decile")

Model Generalizability

We then looked to test the generalizability of our model using a cross-validation test. In this analysis, we split our data into 100 groups - one group acts as the test set with the remaining 99 groups acting as the training set. This process is repeated for each individual group, resulting in 100 “scores” telling us how well our model predicted for each sample of new data.

Our average MAE of $38 is similar to the MAE of our initial test, and our standard deviation of $3.91 suggests there’s not significant variation across our 100 groups. The histogram in Figure 9 below confirms the minimal variation showing a relatively narrow distribution of errors. This indicates that our model is relatively generalizable to new rental price data.

fitControl <- trainControl(method = "cv", 
                           number = 100,
                           savePredictions = TRUE)

set.seed(717)

reg1.cv <- 
  train(price2 ~ ., data = st_drop_geometry(rents) %>% 
          dplyr::select(price2, property_type, accommodates, bathrooms, bedrooms, neighbourhood, room_type, availability_365, number_of_reviews, minimum_nights, Entire_Home, markets_nn1, metrostops_nn2, students_nn2, wallart_nn3, playgrounds_nn2, histbuild_nn3, monuments_nn3, restaurant_nn3, university_nn1, schools_nn2, retail_nn3, industrial_nn1, luxury, lagPrice, canal, expamen, expcodes,citycenterdesc, cheapcodes), 
        method = "lm", 
        trControl = fitControl, 
        na.action = na.pass)

#Standard Deviation and Histogram of MAE
reg1.cv.resample <- reg1.cv$resample

ggplot(reg1.cv.resample, aes(x=MAE)) + geom_histogram(color = "#ff5a5f", fill = "#f6c192", alpha = .7, bins = 50) + 
  labs(title="Histogram of Mean Average Error Across 100 Folds",
       subtitle = "Figure 9") +
  plotTheme()

cv_preds <- reg1.cv$pred

map_preds <- rents %>% 
  rowid_to_column(var = "rowIndex") %>% 
  left_join(cv_preds, by = "rowIndex") %>% 
  mutate(rentPrice.AbsError = abs(pred - obs),
         PercentError = (rentPrice.AbsError / price2)*100) 

map_preds_new2 <- st_join(neighborhoods_new.sf, map_preds, left = TRUE)

map_preds_sum <- map_preds_new2 %>% 
  group_by(neighbourhood.x) %>% 
  summarise(meanMAE = mean(rentPrice.AbsError),
            meanMAPE = mean(PercentError))

map_preds_sum %>%
  group_by(neighbourhood.x) %>%
    st_sf() %>%
    ggplot() + 
      geom_sf(data = map_preds_sum, fill = "grey40") +
      geom_sf(aes(fill = meanMAPE)) +
      scale_fill_gradient(low = paletteblues[1], high = paletteblues[5],
                        name = "meanMAPE") +
    labs(title = "Mean MAPE by neighborhood",
         subtitle = "Figure 11") +
    mapTheme()

Conclusion

To optimize the user experience for the Checkbnb customer, we successfully honed a relatively accurate and generalizable model that minimizes price error. Our goal at Checkbnb is for each user to get an accurate predicted price for their individual home so that they can appropriately list and rent their units in a timely fashion. We were able to use and engineer a diverse set of variables that accounts for the spatial process at the neighborhood scale and overall variation in price. Under a stringent and complicated regulatory landscape, this model provides a tool that helps users effectively navigate the Airbnb rental market that is otherwise less accessible in Amsterdam. Airbnb wants to make the prospect of becoming a host as simple and streamlined as possible, and we believe our model effectively achieves this goal.

To improve the analysis, more up-to-date Airbnb data is required since we don’t know how the regulatory changes that the City has undertaken may have impacted the rental market. For example, since the city banned Airbnbs from three city center neighborhoods, there may be implications on increased rental prices as a result of a strained market. There is also opportunity to further utilize neighborhood demographics, socioeconomic indicators, and other census variables that we were unable to acquire for this particular study. Access to this data would allow us to better generalize across different neighborhood contexts given and pinpoint any spatial biases in our model.